Skip to content

Conversation

@Sparshkhare1306
Copy link

📋 Summary

Describe the purpose of this PR and the changes introduced

This PR ports a GENIE-like model extraction and pruning attack into the PyGIP codebase and adds a small PyTorch-Geometric (PyG) fallback so the project can run smoke demos on machines that do not have a working DGL installation. The goal is to make it straightforward for maintainers to run a quick demonstration of the GENIE attacks inside the PyGIP repo.

Files added / changed (high-level)

  • attacks/genie_model_extraction.py — GENIE-style model extraction attack, adapted to the repo BaseAttack usage.
  • attacks/genie_pruning_attack.py — Pruning attack adapted to the repo BaseAttack.
  • models/gcn_link_predictor.py — Minimal GCN link-prediction model used by the attacks and the demo trainer.
  • pygip/datasets/datasets.py and pygip/datasets/__init__.py — PyG-only fallback dataset loaders: load_ca_hepth, load_c_elegans, and a SimpleDataset wrapper.
  • examples/train_small_predictor.py — Small trainer that saves a demo checkpoint (examples/watermarked_model_demo.pth).
  • examples/run_genie_experiments.py — Example script that runs extraction then pruning and prints metrics.

🧪 Related Issues

  • Port of GENIE attacks requested for PyGIP integration (no issue number currently; please link if there is an existing issue).
  • This PR is intended to satisfy the request to adapt GENIE-style attack code into the PyGIP format for review and smoke-testing.

✅ Checklist

  • My code follows the project's coding style (best effort; please flag any style issues in review).
  • I have tested the changes and verified that they work (smoke tests run locally; see "How to verify" below).
  • I have added necessary documentation (I added inline comments and example scripts; please advise if you want docs in docs/).
  • I have linked related issues above (none existed; please add if applicable).
  • The PR is made from a feature branch (feat/genie-watermark-ft), not main.

🧠 Additional Context (Important — please read)

Quick reproduction steps (exact commands)

From the repository root:

  1. Make repo importable (temporary):
export PYTHONPATH="$(pwd):$PYTHONPATH"
  1. (Optional) Install editable package for consistent imports:
pip install -e .
  1. Train the small demo teacher (creates examples/watermarked_model_demo.pth):
python examples/train_small_predictor.py

This prints training loss and writes examples/watermarked_model_demo.pth

  1. Run the GENIE-style demo (extraction + pruning):
python examples/run_genie_experiments.py --dataset CA-HepTh --model_path examples/watermarked_model_demo.pth
  1. Expected lines in output:
Loaded dataset CA-HepTh: nodes=9877 feat_dim=64
[GenieModelExtraction] Running on device cpu
Extraction result: {'dataset': 'CA-HepTh', 'query_ratio': 0.05, 'surrogate_test_auc': <number>}
Pruning result: {'dataset': 'CA-HepTh', 'prune_ratio': 0.2, 'test_auc': <number>, 'watermark_auc': <maybe None>}

What I observed during local testing (so reviewers know what to expect)

  • If no checkpoint is provided, extraction AUC is ~0.5 (random), pruning AUC also ~0.5 — expected since teacher is untrained.

  • After training the small demo teacher (examples/train_small_predictor.py) and supplying that checkpoint:

    • surrogate_test_auc can increase (example observed ≈ 0.70 on the tiny demo teacher).
    • test_auc after pruning can be significantly >0.5 depending on the demo checkpoint (observed ≈ 0.79 during local runs).
  • The current implementation is a smoke/demo implementation — it is not a full, large-scale reproduction of the GENIE paper experiments (no large hyperparameter sweeps, multiple seeds, or large dataset jobs included).

Important limitations & notes

  • This PR adds a PyG fallback so reviewers who cannot install DGL can run the demo. DGL codepaths remain in the repository; if DGL is available, you can still use them.
  • Checkpoint loading is “best effort.” If you feed a checkpoint from a different model definition, adapt models/gcn_link_predictor.py loader to match your keys.
  • The ported attack code aims for clarity and compatibility with PyGIP’s BaseAttack API — it preserves algorithmic intent but is simplified for readability and reproducible smoke runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant